Direction of negative curvature for regularized SQP 
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1. Introduction 



This note discusses the computation and use of a direction of negative curvature 
in the regularized sequential quadratic programming primal-dual augmented La- 
grangian method (pdSQP) of Gill and Robinson [7], [8] for the purpose of ensuring 
convergence towards second-order optimal points. Section 2 discusses how to com- 
pute a direction of negative curvature using appropriate matrix factorizations. Sec- 
tion 3 discusses the specific relevant changes to the algorithm. Section 4 discusses 
u: the changes in the convergence results established by Gill and Robinson [8], show- 

ing that the desired convergence results continue to hold. Section 5 discusses global 
r S^ convergence to points satisfying the second-order necessary optimality conditions. 

S . 

2. Direction of negative curvature 

2.1. The active-set estimate 

An index set V\4 is maintained that consists of the variable indices that estimate 
which components of x on their bounds. This set determines the the space in which 
CO ! to calculate the directions of negative curvature. The tolerance for an index to be 

in Wfc must converge to zero. A test such as i € V\4 if [xk\i < min{//&, e a }, would be 
appropriate for the purpose of forming a V\4 for convexification, initializing the QP, 
and obtaining a direction of negative curvature. Otherwise, it would be necessary 



to use three different factorizations. 
2.2. Calculating the direction 

Recall that in pdSQP, the QP must use a Lagrangian Hessian H such that H+ j^J T J 

is positive definite. The process for forming the requisite H, as well as calculating 
a direction of negative curvature begins with the inertia-controlling factorization of 
the KKT matrix (see Forsgren [4]). Consider the KKT matrix, 

HP ^ ), (2-1) 

with F the set of estimated free variables (those not in Wfe), and Lp\ the identity 
matrix with \F\ rows and columns. 

The algorithm begins an LBL r factorization of the KKT matrix, where L is 
lower triangular and B is a symmetric diagonal with lxl and 2x2 diagonal blocks. 
Standard pivoting strategies are described in the literature (see Bunch and Parlett 
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[2], Fletcher [3], and Bunch and Kaufman [1]). Let the lower-right block be defined 
as D = —fj,Lp\. 

At step k of the factorization, let the partially factorized matrix have the fol- 
lowing structure: 

'L x 0\ (B 0\ (L\ Lf 
L 2 i) VO Aj { I 

with L\ being lower triangular, / the identity of appropriate size, and A the matrix 

remaining to be factorized. Let A be partitioned as A = ^ . If the top left 

element is chosen as a 1 x 1 pivot, at the next step, 

BO 
a 

C-ba- 1 b T / 

Let S = C — ba~ 1 b be the Schur complement of the factorization. The matrix S 
is factorized at the next step. 

For inertia control, this factorization has two stages. In the first stage, we restrict 
the factorization to allow only for pivots of type H + , D~ or HD. This means that 
an element of H is selected such that Hij > 0, a diagonal element of D is 

selected, or (ii, «2, Ji, J2) is selected such that (h,ji) is an element of H, ((2,^2) is 
an element of D and Sfe[(ii, 12), (iijj'a)] has mixed eigenvalues. This procedure is 
continued until there are no such remaining pivots. 

The KKT matrix can be partitioned as 





#11 


H\2 


JT\ 


H 2 i 


H22 


4 


■h 


J2 





where, all of the pivots have come from the rows and columns of Hu, J\, and — fj,I. 
At the end of the first stage, the factorization can be written as: 

Li 0\(B \(LJ L*\ 

L 2 IJ\0 H22-K 21 K n l K 1 2j V I J' y ' 

Let S = H22 — K2\K 1 ~ 1 1 K\2- Proposition 3 of Forsgren [4] shows that if 51 is added 
to -H22 such that 5 > \\S\\ then Kp has the correct inertia. In practice this 5 is 
excessively large for the purpose of constructing the appropriate matrix with the 
required eigenvalues, but this result does indicate that such a constant exists. 

Instead of proceeding to the second phase of this factorization, the procedure 
of Lemma 2.4 in Forsgren et al. [6] is applied to S to compute u, a direction of 
negative curvature for S. The procedure to calculate this u is as follows: 

Let p = maxjj \Sij\ with \S qr \ = p. Define u as the solution to: 

^ fju = ^h, (2.3) 
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where 

e q if q = r, 

-75 (e g — sgn(6 gr )e r ) otherwise. 



h 



I V2 X 

This u satisfies u T Su < 7A m i n (S')||u|| 2 , with 7 independent of S. 
The following bounds are important for the subsequent second-order convergence 
theory. 

Lemma 2.1. Let u be defined as in (2.3), S be the Schur complement of the partially 
factorized matrix (2.2), Jp and Hp defined as in (2.1), and Z a matrix consisting 
of columns for the basis of the null-space of Jp, then 

"TTT1 19 < A m i n (S') < X m i n (Hp H JpJp) < X m { n {Z T HpZ). 

7IMI A* 
Proof. Lemma 2.4 in Forsgren et al. [(i] directly implies that {t T Su/7||u|| 2 < 

Xmin(S). 

The proof that A m i n (S') < X mm (Hp + j^JpJp) is given in the proof of Theorem 
4.5 in Forsgren and Gill [5]. For the final inequality, let w = Zv, with Z T HpZv = 
X m m(Z T HpZ)v and \\v\ \ = 1. Then 



1 rp s w T (H F + ±J%J F )w 
Xx in(H F +-JpJ F ) < ^ = w T H F w = v T Z T H F Zv = X mm (Z T H F Z). 

fJL W 1 W 



3. Implementing Directions of Negative Curvature 
3.1. Step of negative curvature 

Several changes must be made to the algorithm of Gill and Robinson [8]. In order to 
minimize the number of factorizations, the computation of the direction of negative 
curvature should be followed by a test of second-order optimality. In addition, 
it is necessary that the direction of negative curvature is bounded, and a feasible 
direction with respect to both the linearized equalities and the bound constraints. 
Finally, the line-search must be extended to allow for this additional step of negative 
curvature. 

In the description below, the subscript k denoting the step number in the se- 
quence of iterations is suppressed. 

The following procedure satisfies these requirements. 

1. The first step computes the direction of negative curvature for the free KKT- 
matrix as described in Section 2, denoted as up, then defines u to be [u]p = up 
and [u\a = 0- If 110 such direction of negative curvature exists, then u is set 
to zero. 

2. The second step uses uina test of second-order optimality. This is described 
in Section 3.2. 
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3. The corresponding change in the multipliers corresponding to the definition 
for u is defined as w = —j^Ju. This ensures that the linearized equality 
constraints are satisfied, i.e., 

= Jp + c + [iq = J(p + u ) + c + fi(q Ju). 

A* 

The final resulting (u, w) is shown below in Section 3.3 to be a direction of 
negative curvature for V 2 .M. 

4. Since both (u,w) and —(u,w) are directions of negative curvature, the sign is 
chosen so that the step is a descent direction for V.M, i.e. VAi T ( ^ ] < 0. 

W ' 

5. Compute Av = (p, q), the solution of the convex QP. 

6. The direction of negative curvature is scaled so that it is both bounded by 
max(ti max , 2||p||) and also, in conjunction with the QP step, satisfies the bound 
constraints x > 0. 

Specifically, u and w are set as u = f3u and w = f3w, where 

(3 = jmax/3 | x +p + j3u > 0, \\/3u\\ < max(u max , 2||p||)| . 

Note that this implies that if [x + p]i = and [u]i < 0, then u is set to zero. 

3.2. Optimality measures 

Recall that in Gill and Robinson [8], with 

(f> s (v) = 7](x) + W~ 5 uj(v ) and (j) L (v) = 10~ 5 7](x) + cj(v), 

where 

rj(x) = ||c(z)|| and u(x, y) = ||min (x, g(x) — J(x) T y^ || , 

an iterate is an S-iterate if (f)s(v) < \4>s ax and an L-iterate if 4>l(v) < \(j)™ ax . 
Otherwise, an iterate is an M-iterate if 

\\V y M(v k+ i;yf,n%)\\ < T k and || min(a? fc+ i, V x M v (v k+1 ; ))|| < r k . 

If none of these conditions hold, then an iterate v k is an F-iterate. 

In order to force convergence to a second-order optimal point, it is necessary to 
change the function cj(x,y) that appears in (frs and (j>L, as well as the test for an 
iteration being an M-iterate. 

Ideally, the minimum eigenvalue of H in the null-space for Jp should be found, 
as well as the minimum eigenvalue of \7^. X A4. However, this would require extensive 
computation. Instead, these quantities are estimated based on the value of the 
negative curvature. Recall that 



uT(H + ±J T J)u i T 

]2 — A min (.ff" + — J J), 
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where we suppress the suffix F. Since 7 is bounded from below and above, if 



u T (H + i J T J)u/\\u\\ 2 —?■ 0, the estimate for u implies limA m i n (i? + jiJ T J) > 0. 
Hence, the test for M-iterate optimality is changed to: 



\V y M(v k+1 ;yJ?,ti§)\ \ < r k 



and || mm(x k +i, V X M U (v k+1 ;y k , n k ))\\ < T k 



T k+1 (H + ±J T J)u k+1 



and Tpz iTo > T k . 

\\ u k+i\r 

Similarly, for the filter functions, 

(f>s(v) = T)(x) + W~ 5 uj(v) and 4>l{v) = 10~ 5 r)(x) + u(v) 
the optimality tests become 



u J k+1 (H + Kj T J)u k+1 

r](x) = \\c(x)\\ and y) = min(| I minfx, g(x) — J(x) y), ). 

\Uk+ir 



3.3. Merit function 



The line-search must also be changed to include the direction of negative curvature. 
First, it will be shown that the full primal-dual step is a step of negative curvature 
for the merit function Hessian. 

Lemma 3.1. The vector (u,w) defined as in 3.1 is a direction of negative curvature 
for V 2 M. 



Proof. Consider the calculation of ( U ) V 2 A4 



T 

W I \w 



\ T (H + Ul + v)J T J uJ- 



w J \ vJ viil) \w 

Hu+±(1 + u)J T Ju + vJ T w 



T 

u ' 



wj \ uJu + vnw 

= u T Hu + Ml + u)u T J T Ju + 2vu T 3 T w + i^||w;|| 2 . 

From the definition above, u = (3u and u T (H + j^J T J)u < 7A m ; n (-ff + ij T J) ||u|| 2 , so 
multiplying both sides by /3 2 , the expression becomes u T (H + ^J T J)u < jX m - m (H + 
ij T J)||n|| 2 . Let 7 = i\unn(H + lJ T J). 
Using w = —j-Ju* 

u T Hu + -(1 + u)u T J T Ju + 2vu T J T w + vn\\w\\ 2 

< - 7 || n || 2 -2^ T J T Ju + ^||Jn|| 2 
= -7||n|| 2 -^||J U || 2 

< — 7||u|| 2 — z//i||ui|| 2 . 
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For the line-search, let R k = u k "V 2 A4 u (v k ; y k , p k )uk — 0. Define a k = 2 ■? such 
that 

M u {v k + a k u k + a 2 k Av k ;y]?, /if) < M v + o^A^ + a k i] S R k . (3.1) 

Letting a = min(a m i n , a k ) and /2 = max (5/ifc, H k+ i) , the update for the penalty 
parameter becomes: 

( (j, k , M u (v k+1 ;y E ,fJ, k ) <M u {v k ;y^^ k ) + air]sR k + a 2 ir]sN k 

Pk+l = S ^ ^ (3-2) 
^ fi, otherwise, 

4. Consistency with established convergence theory 

In their first-order analysis, Gill and Robinson [8] make the following assumptions: 

Assumption 4.1. Each H{x k ,y k ) is chosen so that the sequence {H(x k ,y k )} k >o is 
bounded, with {H(x k ,y k ) + {1/ n k )J(x k ) T J(x k )} k >o uniformly positive definite. 

Assumption 4.2. The functions f and c are twice continuously differentiable. 

Assumption 4.3. The sequence {x k } k >Q is contained in a compact set. 

Since VA4 U does not involve any term involving the objective or constraint Hes- 
sians, much of the first-order convergence theory holds. Incorporating the direction 
of negative curvature, Theorem 4.1 changes to: 

Theorem 4.1. If there exists an integer k such that fj, R = fi R > and k is an 
J- -iterate for all k > k, then the following hold: 

1. {\\Av k \ \ + H w fc||}fc>fc is bounded away from zero 

2. There exists an e such that for all k > k, it holds that 

VM u (v k ;y%,(i R ) T Av k ) < -e or ulV 2 M v {v k ;y% \nf)u k < -e. 

Proof. If all iterates k > k are ./-"-iterates, then, 

r fe = t > 0, n k = ij, r , and y k = y E for allfc > k 

Proof of the first result: Assume the contrary, i.e., there exists a subsequence 
Si C jfc | k > k\ such that lim ke s 1 Av k = and limfe 6 5 1 u k = 0. The solution Av k 
to the QP subproblem satisfies 

H v M {v K ;ii R )Av k + VM u (v k ;y E ,fi R ) and = mm(x k + p k , z k ). 

As H V M is uniformly bounded, eventually for some k 6 S\ sufficiently large, Av k 
satisfies the first-order conditions of an M- iterate, i.e., 

1 1 V y (vfc+i ; 3/fc 7 , At* ) 1 1 <r fc and \\mm(x k+1 ,V x M u (v k+1 ;y§ , <r k . 
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In the construction of u k , recall that \\u\\ is the largest possible value, subject to 
an upper bound, that is feasible. This implies that if lining — > 0, then eventually, u 
is constrained by feasibility, or set to zero. 

In the first case, i.e. the limiting upper bound constraint on u k must be x k + 
Pk + Uk > 0, eventually since u k — > and pk — >• 0, if i is a blocking bound for u k , 
X{ < min(/i,e x ) and i € which implies that [u]i = 0. Hence, by construction 
and the fact that the set of possible indices is finite, Uk is eventually identically 
zero. This implies that the second-order conditions of an M-iterate are also satisfied 
trivially, i.e., 



l^fc+ill 2 



> Tfc, 



and [i R is decreased. This contradicts the assumption that fi R is held fixed at 
H R = /j, R for all k > k. 

Proof of part (2): Assume, to the contrary, that there exists a subsequence 52 
of {k : k > k} such that 

lim VM v (v k ; y E , ^ R ) T Av k = (4.1) 

and 

,TV72 KAV(„. .„.E ,.R\ 



lim u 1 k V 2 M' / (v k ;y%,^)u k = 0. 



Consider the matrix 



Since the Av = is feasible and Av k a solution for the convex problem, it follows 
that 

-VM v (v k ;y E ^ R ) T Av k > \Av T k H u M (v k ; fi R )Av k 

= lAvlLl T L T k HUv k \V R )L h Ll l Av k 

_( Pk V (Sk + A^jT^ o \ / pk > 
\qk+jnJkPkJ I vn R ) \qk + jpJkPkj 



Since H V M is bounded, 

AvlL-f LlH v M {v k -^ R )L k Ll x Av k > A min ||p fc || 2 + v\i R \q k + (l/^ a )JjfcP fc || 2 , 
for some A m ; n > 0. Combining this with (4.1) it follows that 

lim p k = lim (q k + -\j k p k ) = 0, 

in which case lim ke s 2 q k = 0. Hence Av ke $ 2 — > 0. 
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Since lim ke s 2 u T k V 2 A4 u (x k ,y k -,y E , [i)u k = 0, there exists a &2, such that for all 
k > k2, u k "V 2 M. u (x k ,y k ;y E \ ^Uk/jWv-kW 2 > —t or u k — > 0. The former, by the 
same argument as for part (1), together with Av k — > 0, implies that eventually k 
is an M-iterate. The latter, together with lim A k = 0, contradicts the statement of 
part (1) of the theorem, so part (3) must hold. | 

The proofs of the first result of Theorem 4.1 and Theorem 4.2 of Gill and Robin- 
son [8] do not change. 

5. Global convergence to second-order optimal points 
5.1. Filter Convergence 

Definition 5.1. The Weak Constant Rank (WCR) condition holds at x if there is 

a neighborhood M{x) for which the rank of \ ^ J is constant for all z G M(x), 

\ h AJ 

where Ej± is the columns of the identity corresponding to the indices of x active at 
x (as in i £ A if Xi = 0). 

Theorem 5.1. Assume there is a subsequence v k of S- and L-iterates converging 
to v* , with v* = (x*,y*) satisfying the first-order KKT conditions. Furthermore, 
assume that MFCQ and WCR hold at v* . Then v* satisfies the necessary second- 
order necessary optimality conditions. 

Proof. Let d G T(x*) = {d\ J(x*)d = and E^d = 0} with \\d\ \ = 1. By Lemma 
3.1 of Andreani et al. ([9]) there exists {c4} such that d^ G T(xk) and d^ — > d, 
where 

T(x k ) = {d | J(x k )d = and = 0} . 

Without loss of generality, we may let \\dk\\ = 1. Since x k — > x*, eventually W k = 
A*, where A* is the active set at x*. Then, by the definition of the S- and L- 
iterates, and Lemma 2.1, dl(V 2 f(x k ) + Ylyk^ 2 c(x k ))d k > X m in(Z k H k Z k ) > -£ k , 
where < £ k ->• 0. Taking limits, it follows that d T (V 2 f(x k ) + ^ y*V 2 c{x*))d > 0. 
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